DoetnffiKT RESino: 



ED 281 865 TM 870 249 

AUTHOR Bbiat , Robert F. 

TITLE Seheralization of GRE General Test Validity across 

Departihehts; 

IHSTITUTIGN Educational Testing Service, Princeton, N.J. 
SPONS AGENC7 Graduate Record Examinations Board, Princeton, 

REPORT NO ETS-RR-86-'46; SREB-82-i3P 

PUB DATE Dec 86 

NOTE 27p. 

PUB TYPE Reports - Researcfi/Technicai U43) 

EDRS_PRICE_ HF01/Pe02 Plus Postage^ 

DESCRIPTORS *Cbllege Entrance Examinations; Departments; Grade 

Point Average; *Graduate Study; Higher Education; 

Hypothesis Testing; *Predictive Validity; Sample 

Size;^ *Test Theory; *Test Validity 

IDENTIFIERS *Graduate_ Record Examinations; GRE Validity Study 

Service? Range Restriction; *Validity 

Generalization 



ABSTRACT 

This study of the validity of the Graduate Record 

Examinations (GRE) General Test used_data_ from predictive validity 
studies that were conducted by the GRE Validity Stucly Service (VSS) 
in 79 graduate departments. The performance criterion was first-year 
grades in graduate school. Observed validities were computed, and for 
each graduate department validities were also estimated for groups at 
two other stages of selection — applicants for admission to the 
department,_and all GRE takers. Two hypotheses were tested: (1) 
General Testes yalidi ties were across studies; and (2) General 

Testes validities had equal ratios across studies^ i.e.^ the level of 
validities might vary from institu to institution^, but the ratios 
would be constant. These hypotheses were applied for VSS groups, 
applicant groups, and ail GRE taker s^^ implied validities were 

calculated. When the implied validities were compared to the observed 
validities, it was found that the assumption of equal validity did 
hot account well for differences in the level of obseirved validity of 
the GRE General Test^ The equal ratio hypothesis accounted for the 
observed validities rather well, but departmental discipline was not 
significantly related to the degree of fit of observed to implied 
validities. At all levels of selection, the study yielded applicant 
validities that were predominantly positive* this lencls support to 
the presumption that the GeneralTest^s validity is transportable^ 
iae., institutions that do not use the General Test can, if they 
adopt it, expect it to prove valid. Appendices include: (1) use of 
test theory to present the effects of self selection; (2) use of a 
supplementary variable when data are missing for an explicit 
selector; (3) generalizing the assun^tion that the validities are 
proportional across institutions; and (4) calculating validities in 
the restricted group. (Author/JAZ) 



EKLC 



GENERALIZATION OF GRE GENERAL TEST VALIDITY 
ACROSS DEPARTMENTS 



Robert F. Boldt 



"PERMISSION TQ BEPRQDDCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATiON CENTER (ERiC)." 



DEPARtMENT OF EpUCATION 

.P*?|ce q<Jdu<MlK>o*LR»»e«fCh:tnd Improvement 

EDUCATIONAC RESOOHCESJNFORMATION 
CENTER (ERIC) 

t/u T^ija document has^ been reproduced 
receiKed-from the person or oroanization 
originating It 

O Minor cl>anges have been made to improve 
reproduction quality, 

• Points of view or opinions stated in this docu- 
mont do not nocessarily represent official 
OERI position or policy. 



GRE Board Professional Report No. 82-13P 
ETS Research Report 86-46 



December 1986 



This report presents the findings of a 
research project funded. by and carried 
out under the auspices of the Graduate 
Record Examinations Board i 



EDUCATIONAL TESTING SERVICE, PRINCETON, N} 



GENERALIZATION OF GRE GENERAL TEST VALIDITY ACROSS DEPARTMENTS 



Robert F. Boldt 



GRE Board Professional Report No. 82-13P 



December 1986 



Copyright (c) 1985 by Educational Testing Service^ Ail rights reserved. 



EKLC 



GEWEK^iZftTIC*! OF ^ GENERAL TEST WiIDITY ACROSS DEPARTMENTS 

Abstract 



This stucfy of the validity df the GRE General Test used data 
from predictive validity sttidies that were conducted by the GRE 
validity Study service (VSS) in 79 graduate departinents. The 
perfonreuice crit^^ first-year grades in graduate school. 

Observed validities were coitputed, and for each graduate department 
validities were also estimated for groups at two other stages of 
selection— applicants for admission to the department, and all QBE 
takers. 

Two validity generalization hj^theses were tested. One was 
that the General Testis validities were equal across studies; the other 
\^s_that theGeneral Test's validities had equal ratios across studies, 
that IS, that_ the level of validities might vary from institution 
to institutlonbut the ratiosj^uld be constant. Thise hypotheses were 
applied for VSS groups^ applicant groups, and all GBE takers, and 
iitplied validities, (validities that would be observed if the hj^theses 
were true) were calctdated. When the inpiied validities were coitijared 
to the observed validities, it was found^^^ the asstmption of equal 
validity did not account well for differences in the level of observed 
validity of the GRE General ^st. The equal ratio hypothesis accoxanted 
for the observed validities rather^ll, possibly due to 
overcapitalization on chance, but departraenLal discipline was not 
significantly related to the degree of fit of observed to iimlied 
validities. 

, At ail levels of selection, the study yielded applicant 

validities that were predominantly positive. This lends support to the 
presuiiption that the General Test's validity is transportable^ i.e., 
institutions that do not use the General Test can^ if they adopt it, 
expect it to prove valid, in view of the scarcity of very low or 
negative validities, studies revealing such validities should be 
questioned. 



GENERAL12AT1C»I OF ORE GENERAL TEST ^^iDiTY AO^SS DEPARTMS^S 



R. Boldt 



Traditionally, research on admissions testing has eit^Jhasized 
the results of local validity studies, i.e., separate studies us in^ 
only data from individual institutions^ without regard to data 
collectedat oUier, pd^ similar, institutions. This practice, 

reinforced by the variatiM^ test validi ties from institution to 
institution, has been regarded as cpnsistent^w^ professional 
standards for test use, vdiich have embraced the notion that success may 
indeed be more. predictable at some institutions than at others •These 
beliefs were also widely held in applications of testing, in 

vdiich test validity v^s_ thought to be highly specific to particular 
situations^ For exaitple, as late as 1975 the American Psychological 
Association' s Division 14 (Industrial and Organizational Psychology) 
stated in its Principles for the yedidation and Use oi P^sonnel 
Selection Procedures that; 

Validity coefficients are obtained in specific situations. 
They apply only to those situations, A situation is 
defined by the characteristics of the saitples of pec^le, 

of settings, or criteria, etc. Careful Job and 

situational analyses are needed to determine whether 
characteristics of the site of the original research and 
those of other sites are sufficiently similar. to make the 
inference of generalizability reasonable. (p,13) 



An even more extreme view was espoused by the EgualEraployment 
Ol^rtimity Cqinmission's (EEOG) Guidelines on Snployee Selection 
Procedures (1978), which required every use of an enployment test to be 
validated. 

However^ research on institutional differences in test 
validity. iSchmidt. and Hunter, 1977; Schmidt, Hunter, Pearlman, and 
Shanej, 1979; Pearlman, Schmidt, and Hunter, 3980; Schmidt, 
Gast^Rdsenberg^ arid Hunter > 1980) led increasingly to awareness that 
the effects of numerous prestlmed-tb^be iztportaht variables were far 
snaller than su^^sed. In fact, much of the observed variation in test 
validity could be explained tys artifacts , most notably 

error resulting from the use of small sanples and differences among 
institutions in (a) the effects of selection on_the distribution of 
test scores and ib) the reliability of the criterion. This growing 
awareness was reflected in the X980 version of the American 
Psychdldgical Association's Division 14 Principles , as follows: 

Classic psychon^tric teaching has Idng^ held that validity 
is specific to the research study and that inability to 



5 

o 

ERIC 



-2- 



gerieralize is brie of the nbst_ serious shortcomings of 
selection psychology (Guibri^ 1976) . {But] . .^current 
research is showing that the differential effects of 
numerous variables may not be as great as heretofore 
assumed. To these findings are being added theoretical 
fbrmulations, buttressed by enqpiricai data, which propose 
that much of the difference in observed outcomes of 
validation research is due to statistical artifacts.. . 
Cpntinued_eyidence in this Uirectiori shouldenable further 
extensions of validity generalization, (p. 16) 



Ih_ addition to acceptance by Division 14 of validity evidence 
firbm gerieralizatiori studies in the personnel sphere, more general 
accepteuice has been wri. The Americari Educatibrial. Research Association 
(ATOA), Uie American Psychol^^^ (APA) , and the Natibrial 

Council on Measurement in Education (NCME) have approved a_ revised 
edition of Standards for Educational and PsychGlogicai Testing (AERA, 
APA, NCME, 1985). Principle 1.16 in tHise Standards "states; 



When adequate local validation evidence ii.^ ribt available, 
criterion-related eyidenc for a specified 

test use may be based on validity generalization from a 
set of prior studies, provided that the specified test-usa 
situation, can be considered to have been dvrawn from the 
same population of situatibris on which validity 
generalization v^s conducted. 



This increased acceptance of results of prior studies as 
evidence of validity at a new site or institution, called validity 
gerieralizatiori, is quite welcbme. because. the GRE Program does indeed 
have difficiilties in conduct local validity studies. .Quite often 
toe niimber of cases available fro^^ an institution is small . validity 
generalization offers possible relief from this problem because, in 
this approach^ the number of cases is increased through the pooling of 
data from many institutions, and the sinplifyihg asstmptibh about the 
relationships aitibrig validities reduces the number of parameters to be 
estimated. 

ifiree types of approaches to validity generalization have been 
discussed in. the literature. Schmidt and Hunter , using data gleaned 
from the published literature, used single selector range restriction 
theory (Gullikseii^ 1950^ chap. ll;_Thorndike, 1982 pp. 298-212) and 
classical test theory (Gull ikseri^ 1950 chapt.. 3; Lord and Noviek, 1968 
part 2). The estimation procedures available tb these authors were 
limited because various useful data, especially aj^licarit. statir' Ics 
and test reliabilities, were not available. Schmidt and Hunter type 
of study is most usually associated with the term "validity 
generalization," but their analysis was not used for this stucty because 
applicant pool data and test rexiabilities were available. 



A second approach to validity grerieralizatibri is that utilized 
by Limi and Hastings (1984) They examined law school admissions data 
obtained from the Law School Admission Test (LSAT) program, for ^^ich 



ERIC 



applicant pool data re av The approach taken by Linn and 

Hastings featured regression analyses with the law school as the data 
point* -They developed regressions of LSftT validities on other 
statistics, such as the standard deviation of LSftT scores and the 
correlation between LSAT scores and the tmdergraduate grade point 
average for admitted_students. This procedure hasbeeh used by 
others--for exanpleBaird( 1983)— but has not usually been referred to 
as validity^ generalization research. It pertains to the 
transportability aspect of validity generalization since regression 
formulas could be used to estimate the validity of the LSAT for a 
school newly adoptihg the test, and the correlation coefficient could 
indicate tJie_ precision of the estimated, validity. Since this procedure 
uses ^multiple regressions and entails the validation of various 
combinations of variables in the selection of most valid predictor 
combinations, a very large number of validity studies are needed to 
avoid excessive capitalization oh chance; otherwise substantial 
overestimates of the amount of validity generalization can result. 

A third a|$>rdach is to use test theory and range restriction 

theory—Schmidt and Hiirrt to use the multivariate version of 

range restriction theory_together with ag^li cant data from the 
institutions (sites) studied and data from the total pool of examinees. 
This approach was available to Linn and Hastings (1984), but they chose 
not to use it, citing a standard profXDseQ by Lord and Novick (1968, p. 
147)^ _The standard cited is that when the ratio of the test score 
standard deviation in the applicant pool to that in the admitted pool 
exceeds 1^4, the use of range restriction assunptidns is questionable. 
A^ ratio of 2.0 is to be regarded as extreite. Of 154 schools in the 
Linn arid Hastings study the ratio of 1.4 was exceeded by 45; for three 
of these the ratio exceeded 2.0, With such data it is reasonable to 
seek another apprpachj, such as the second one mentioned above, instead 
of staying close to the test theory and range restriction approaches of 
Schmidt and Hunter. 

In graduate education, selection is apparently not as extreme 
as in the law school context, llie numbers of departiaents in the 
current study for _^ich the ratio of the_staridard deviation of 
applicant group scores to those of the VSS group exceeded 1.4 were 6, 
5, and 11 for the verbal^ quantitative, and analytical measures/ 
respectively, distributed over 16 departments. In no case did the 
ratio equal or exceed 2.0, and in most cases the raciqs that exceeded 
1.4 did riot exceed it by much. "Thus, the degree of selectivity of the 
admissions process does not preclude the use of the range restriction 
nrodei. The present study uses this third approach to validity 
gene ral i zati on . 

_ ^ _^ Despite these differences, the present study and other 
validity generalization studies have a common focus on the distribution 
of validities after statistical artifacts arc removed. The stucty 
reported here considered four preadmission variables: the verbal ^ 
quantitative, and analytical scores from the GRE General Test, and 
undergraduate grade point average (GPA) . 



7 



ERIC 



-4- 



Generalizatidii Hypotheses 



Because validity generalization research is concerned in part 
with the effects of selection on apparent test validity, the present 
study considered groups of examinees at three stages of selectiba. The 
first level was that of "all test takers/" i.e. ^ those v^o took the GRE 
General Test during a given pericd of time. This group served_ as a 
standard population on ^ich selection not yet operated. The 
second level was that of "applicant pools," which consisted of General 
Test examinees who had applied tOwne sanple of graduate departments 
included in the study, and who differed from "all test takers" by 
virtue of the effect on true test score distributions of various social 
forces that influence application^^ (The effect of these 

forces on the distribution of test^scores is d ) 
Ihe third level consisted of examinees who were_admitted to departments 
for v*iom validity studies had been conducted (Graciuate Record 
Examinations Board/ 1985). Having been (aj previously sorted to 
applicant groups b^ self_selectidn,_(b) selected by departments bh the 
h^sis of scores on Uie GF£ General Tes and undergraduate recordiand 
(c) persistent in coimpleting the first year of college, these "VSS 
groups" were therefore the most higfiiy selected of those considered 
here. 

This study tested the following generalization hypotheses in 
groups at each of the three levels of selection mentioned above: 

1. that validities for a measure were the same for ail 
institutions, and 



2. that, although the validities were not the same for 
institutions, the ratios of the scores ' _ 
validities (verbal to quantitative and quantitative to 
analytical) were the same. 



The second hypothesis allowed for variation in criterion 
reliability among departments. 

Procedures 

Samples 

Two convenient sources of GRE data were available: test 
analyses and the student history file. Ifest Analyses contain 
statistical data describing examinees from particular adininistratJ.ons. 
For this project ^ data from General Test forms 3DGRt, 2, and 3, which 
\^re administered in 1981, were. combined to provide. estimates of 
standard deviations, Intercorrelatibhs, and reliabilities fbr the group 
referred to as "all GRE takers." The student history file contains data 
on all examinees, including iJio^^ who ultimately attended departments 

that conducted validity studies. General Test scores and xmdergr 

GPAs were available, as were responses to the backgromd inforraat 
cgpestions (BiQ) on the General Test registraion form. For a student ^s 
data to be included as part of a VSS group in this stucfy, a conplete 



EKLC 



8 



-5- 



set of BIQ^ GRE scores -undergraduate SPA, and first-year graduate 
school GPA data had to_be on file* Of those whose data had been used_ 

the validity Study Service in the past, only 37 percent had conplete 
data, ^is resulted in a severe loss of cases^ and many studies could 
not be used, ft lower boiand of 25 usable cases was required for the 
inclusion of a particular st^dy. Eighty studies qualified on this 
ground^ but one was subsequently dropped because the standard deviation 
of self^ reported grades_was very much smaller for the applicant pool 
than for the VSS group for that institution, ("flils is an extrewely 
atypical situation, and one v*iich^ when used in the range restriction 
confutations led to a negative estim of test scbri variance* 
Clearly there was something wrong with those data.) Thus, 79 studies 
were included in the present research. The total nijinber of cases was 
3,832, and the study sizes ranged from 25 to 194, with a mean of 48.5 
and an interquartile range of 28 to 54. 

The history file was searched for the records of all examinees 

v*i6 had had scores sent to the departments v*ipse studies were included 
in this research. Those for a particular department are referred to 
elsewhere in this report as the applicant pool for the department. 

Analyses 



Test analyses for forms 3pGRl, 2, and 3 (Wallmark^ 1982a, 
1982b^ 1982c J > which _were_ administered in 1981, provided the data to 
estimate statistics_fbr all GRE takers, ihese analyses contained 
sanple sizes and GRE Gener^^ Test score means, variances^ correlations, 
and reliabilities. The within-administratibn statistics were used to 
estimate statistics for the group of all candidates tested in 1981. 
Since the reliabilities were available, the variance of the error of 
measurement could be computed for an administration as the test_ 
variance times (1 - reliability)^ A weighted average of these figures 
was used as the variance of tiie error of measurement in the total 
test-taking population. 

_ _ With the exception of data on the undergraduate school 
performance of applicants, data on all preadmission variables were 
available for each VSS group and for each applicant pool. Thus^_data 
\^re available for three selectors IGRE verbal, quantitative, and 
analytical scores) in both groups; for one selector (undergraduate 
school performance) in only the restricted (VSS) group; and for a 
variable subject to selection (self-reported tindergraduate school 
performance) in both groups. Although this is not the usual 
configuration of data available for range restriction con^tations^ it 
was sufficient for estimating the variance of the undergraduate school 
perfonnance yariableand its cbvariances with scores for tne applicant 
pools (see ^^pendix B for formulas) . implication of the formulas in 
i^pendix B provides estiinates of the variance-cbvariance matrices o£ 
selectors—that i3, scores and undergraduate school performance— 
for both the applicant and VSS groups. Data in graduate grade point 
average was available only for the VSS group. This configuration of 
data availability/ typical in projects that involve correcting for the 
effects of selection, allowed the use of standard formulas to estimate 
the applicant pool statistics for the graduate grade point average 



9 

o 

ERIC 



(Gullikseh^ 1950, pp. 165-166; Thorndike, 1982, pp. 260-261). 

An alternative and ittuch siitpler procedure than the one thai 
uses the formulas of ^gendix B w considered— that _of using the 
self-reported undergraduate school perforinances as if they were actual 
grades, if these variables were highly correlated, this procedures 
would have been used. However, the relationship betvgeen self-reports 
arid actual perfbrinahces was not. high enough. The average correlation 
of the self-reported with actual undergraduate school performance was 
only .23. its standard deviation was .15, and its maximum was only .61. 



•Hie next step was to estimate test score validities for all 
GRE General Test takers, in this conputation^ the applicant pools were 
regarded as the restricted groups that were selected from all GRE 
takers. For each department ^ the validities for all GRE takers were 
computed. GRE tnae^s took the role of selectors, with GRE scores 

and graduate grade point averages ^^te to the effects of 

selection. (As mentioned previously, the motivation for giving the 
true score this role is put forth in ^pendix_A. ) Because the test_ 
statistics for all GRE takers were already con^nated, it was necessary 
only tiD_ correct the graduate grade point statistics. For this purpose 
the configuration of information \ms the same as when generalizing, 
validities to the applicant pool--e^ data were available 

in both the restricted and unrestricted pools, but data on the variable 
subject to selection were present only in the restricted groups. The 
correction formulas were the same, but different variables played the 
J^oles. Then, because the cdyariahces with true scores. were?, according 
to test theory, the same as the covariances with_actual test scores, 
and because the test score statistics for all GRE takers were known, 
validities for that group could be coitputed. After this step, 
covariances and correlations were available for all groups at all 
levels of selection. 

Each hypothesis evaluated_l:y comparing inplied VSS group 

validities with_observed validities. The itt^/lied validities were 
obtained 1^ confuting a siitplificd set of validities/ such as the 
average validities, for the groups in which the generalization jvms 
made, ardcprcecting for the effects b£ selection to obtain the VSS 
pool statistics f or ^Jie particul ar, generalization. _ The generalization 
hypoUieses were each af^^ levels of selection — VSS 

group T applicant pool, and all GRE takers. 

For the VSS groups, the hypothesis of egua] validities was 
iirplemented hy using the average validity for a measure as its implied 
v:.lidity for each department. The equal ratio. hypothesis was tested 
with the formula given in Appendix which multiplies the average 
validities by a different constant for each department and uses the 
result as a different set of implied validities. 

For the applicant pools, the theoretical validities were foiand 
by averaging validities across the pools. Then, using the formula of 
i^)pendix D, "reverse" corrections were made for the effects of 
Selection on the test scores and the under graduate school performance 
variable to obtain validities iitplied by the equal Vr^lidity hypothesis, 
inother set of iiT$)lied validities was obtained for the equal ratio 



10 



hypbt±iesis by applying the ratio-preserving procedures of the formula 
of Appendix C to the .applicant pool validities and again using the 
reverse correction of Appendix D for the effects of selection on the 
GRE General Test and undergraduate record. 

_- ^ _ For generalization at of all GRE takers^ the 
validities estimated for all GRE takers were averaged across groups and 
the_ averages. taken as theoretical validities. Then^ using the formula 
of i^^ndix the reverse correction was applied in two steps to these 
theoretical validities. The first step accounted for selection on true 
test scores; the second step acTO for selection on theC^E General 

Test scores and_the undergraduate school perfdrroance variable. The two 
corrections ^produced inpiied validities that should be observed in the 
yss groups, if the generalization l^^thesis were true. Another set of 
inplied validities was obtained by_ applying the ratio-preserving 
procedures of i^)pendix C to the validities for the pool of all GRE 
takers and again applying the twb^step correction process, 

Ihe results of these confutations were evaluated in several 
ways. First, for each test-l^pothesis-group combination, the means arid 
standard deviations for. the in^lied and observed validities were 
coR$>ared, arid the correlations between in5)lied and observed validities 
were con^ted. Second, the percerit of variance of the observed 
validities toat v^s accc3 for by the inplied validities and 

sarrpling error was calculated. The percent accounted for by the 
inplied validities was sinpiy the sguare of the correlation between 
in^jlied arid observed validities multiplied by one hundred. The percent 
of vafiatidri of the observed validities accounted for hy sampling error 
was calculated by avera^^^ the error variances of the individual 
coefficients, dividing Oie average by the test variance, and 
multiplying ]yy one hundred. The san^le variances of the observed 
validities were calculated using the same formula used hy Pearlman et 
al. (1980) . Third,, for the equal ratio hypothesis applied in the pool 
of all GRE takers^ the differences between the inpiied and observed 
valid! ties were tested for significance, and the patterns of 
significance re con^r^^^ to the types of depairtments. Finally, the 
means, standard deviations, fifth pisrcentile values of validities, and 
percent positive validities were found for the VSS groups, the 
applicant pools, and the takers. For the applicant pools and the 
GRE takers^ the statistics were obtained for both test scores and true 
score validities. The fifth percentile was used because, in many 
validity generalization studies figure called the 95 percent 
credibility value is reported. It is the value above v^ich 95 percent 
of true score validities are expected over a series of studies. 



statistics for all GRE takers were^ for the. verbal, 

quantitative, and_analytical scores, respectively^ as follows i means 
were 494, 532, and 516; standard deviations were 123, 133^ and 129; 
standard errors of measurement were 35, 38, and 37; and reliabilities 
were all .92. Stie corr-?lations v/ere as foliov/s: .5 between verbal and 
quantitative, .67 between verbal and analytical, and .7 between 
quantitative and analytical. 



Table 1 contains the means of the validitieis observed using 

VSS group data, as well as those the six. generalization 

hypotheses. Cieariy, the generalization hypotheses all led to implied 
validities that were, on the average, at the right level. 

Table _2 .contains the standard deviations of the validities 
observed using VSS group data, as well as those estimated under the six 
generalization hj^ theses. Note that, applied in the VSS groups, the 
hypothesis of equal validities led to standard deviatib that.were 
depressed,, but also to better approximations of the observed VSS 
standard deviations when applied in the applicant and GRE taker groups. 
The standard deviations of the generalized validities based on the 
equal ratio hypothesis were hot depressed for the VSS groups, but fit 
well regardless of the groups over which the generalization was made. 

Table 3 contains the correlations of the validities ^ 

using VSS group data with those estimated under each generalization 
hypothesis for each group. _ Note that the equal validity hypothesis 
yields implied validities for the quantitative score that have almost 
no correlation with the observed values^ and that the corresponding 
correlations for the other measure low. The equal ratio 

hypothesis yields implied validities thatcqrrelatc much higher with 
the observed values, but the correlation declines as successive 
corrections for range restriction were made. 

Another figure of merit for evaluating the success of validity 

generalization v^s the percent of variance of observed validity 
coefflciehts accounted for the inqplied validities and sanpling 
variance. The percents obtained for the equal validity hypothesis are 
jpresented in Table 4. Because the average. was used in this 
con^tation, the VSS group figures of 58, 51, and 64 for the verbal, 
quantitative, and analytical scores^ resE>ectively were the percents of 
variance of observed validities by error alone. It can 

be seen that these quantities are substantial, due probably to the 
small sample sizes inybived. The application of the equal validity 
hypothesis in the applicant and GRE taker pools afforded little or no 
in5>rovement over the percent of variance accoimted for by sanpiihg 
error, as one would expect from the very modest correlations for that 
hypothesis in l^le 3. The results for the ecjual ratio hypothesis were 
not presented in Table 4 because seyen out of nine of them were in 
excess of one hundred, with the other two also extremely large. These 
are clearly unacceptable results, which no doubt were obtained because 
the validities iiiplied by the equal ratio hypothesis were markedly 
affected hy sampling errors; the large correlations for equal ratio 
hypothesis in Table 3 almost certainly were substantially subject to 
similar error. The res^^^ the equal validity hypothesis were much 

less affected by such errors because &epaqunt of overde termination in 
calculating the theoretical validities was much less; only three 
parameters were estimated using the ecjual validitY_hypothesis, but 82 
(three r^reasures plus 79 institutions) parameters were found for the 
equal ravio hypothesis. 

The observed VSS group validities were tested individually for 

significant differences from the generalized validities based on the 
ratio model applied to validities for all GRE takers, with 79 



-9- 



cbefficieats, one would expect alirast four of these tests to reject the 
hull. hypothesis at the five percent levels Five of them were 
significant for the verbal score, four of them were significant for the 
guantitatiye score, and t^ were significant for the analytical 
score. There Wfre eight pat terns of signifi can and hbh-sigriificance 
for the three measures, and six of them were observed over the eight 
departments \*iere significance was noted. Also, different gstterns 
were noted for departments of the same kind. These results were 
essentially at the chance levels and with no consistent pattern 
discernible. 

^ _ Table 5 cgnta^ the means, stauidard deviation fifth 
percentile values of test and true score validities, and percent 
positive validities for the VSS, applicant, and ^lE taker groups^ In 
it can be seen ah expected increase ih_ validity as one scans from the 
restricted VSS groups, through the applicants, to all GRE takers. Note 
also that true score validities were greater than test score 
validities, but not greatly so . True score validities were hot 
presented for the VSS groups because the test score reliabilities in 
those groups were hot known. The table shows little difference in the 

stahdatddeviations of validities. Negative validities exist at ail 

levels of restriction, but by far the greatest majority of coefficients 
were positive. 

Discussion 



The context im v^ich validity generalization research arose 
was that of industrial hiring. Substantial degrees of validity 
general! zsation have been reported in this context; i.e. ^ differences in 
observed validities have been accounted for by statistical artifacts 
sudi as restriction of the tests due to their use in hiding, and 

variation in criterion reliability. Occupations over vAiich 

generalization has been made cover a wide variety of settings, perhaps 
even wider than might be ehcbiihtered across academic institutidhs> over 
^ich one ndght therefor^^ expect validity to generalize. This 
sumise v^s supported by Lirm^^^^^ al. (1981) , who found 70 percent 
generalization in a stucJy of law school _validi ties. _ An expectation of 
the. present _stu<^ was that an even hic^er percent of variance might be 
explained if a more coitplete modeling of the selection procedure were 
possible using range restriction techniques. Therefore, multivariate 
corrections for restriction on GRE scbres> GRE true scores, and 
y^dergraduate school performance we enpldyed. The data were mote 
coitpiete than has usually teen the case in such studies, because data 
on the actual appiicant_poois\^re availably 

able to construct a stemdardized national population to_control the 
variation in test reliability among groups of applicants. Even so, the 
geheralizatibh hypothesis of equal validities gave a very poor 
accouhting of the bbsiryed validities for both the applicant pool and 
thenational pool of GRE takers. A la portion— 58 percent, 51 
percent, and 64 percent for Uie verbal, quantitay.ye and analytical 
measures, respectively— of J:he variation of observed validity 

coefficients was due to sanpling variation arising from the snraii 

saitple sizes. In corrparison,. Boldt (1985)_f6uhd that 26 percent and 29 
percent of variation of coefficients of validity of SAT-V and SftT-M, 



EKLC 



13 



-10- 



respectively> was accounted for 1^ samplings and tinn and ffastihgs 
(1984) report 11 percent accounted for in their LSftf study; 



In addition to the hypothesis of equal validities/ a 

hypothesis of equal validity ratios was tissted for the VSS grbups/ the 
applicant groups, and theCaiE takers^ No association of patterns of 
sj.ghificahce with discipline was found. Hhis result was not expected, 
because a difference might reasonably be expected between quantitative 
and riori quantitative disciplines/ for _exaitple. But Braun and Jones 
C1985), in an eitpirical Bayes study of _the structure of coefficients of 
regression of graduate GPA on the ^lE General Test/ also failed to find 
differences associated with the discipline . The failure td_ find a 
systematic pattern in validities in the present study may have occurred 
because there is none, but it could also result from the large 
influence of sanplihg errors on the inplied validities obtained using 
the equal ratio hypothesis. 



ihe vaHdity generalization wo industry established that 

site differences in yaiidi^ were influenced by variations in 
selectivity and in criterion reliabilities, ft major conclusion was 
that low or negative validities were the excepitibh, iitpiying that a 
stui^ producing such validities was as suspect as the test involved. 
Therefore, if a study finds very low or negative validity at a site, 
this suggests that additional research is needed. Perhaps the 
criterion needs inproving, or perhaps there was a^conputational or 
clerical error^ The test itself should not be immediately suspect. 
This industrial research enpiasizes a principle that should be more 
generally appreciated: validity coefficients based on_ selected 
incumbents can be poor estimates of a test's actual validity. 

in the present study, the validity of the General Test 
appears to be highly specific, or at least greatly affected by sampling 
variation. _Even sb,^ the thrtast of the results of this study coincides 
with that of the industrial work^ _llie results, that provide this thrust 
are tJiat^ even though the lower fifth percentile of the validities 
ranged from -^01 to -.12 across the three scores, the percent of 
positive validities was in the high .80s to mid .9bs; that is, the 

great preponderance of validities were positive. The average^ 

validities were in the mid-twenties_ for the VSS grbups, rising to the 
mid-thirties for true scores for all GEE takers, in view b£ the 
scarcity of very low or negative validities^ studies of the GRE General 
Test that yield such validities should be questioned. 



14 



REFERENCES 



American Edueationai Research Association, Americn Psychological 

Assbciatioh^ National Council on Seasxarement in Education 

(1985). Standards for, educational and psychological 

testing . Washington^ DC: Araericah Psychological Association. 



American Psychological Association, Division of Industrial- 
Organizational Psychology (1975)^-^ ^^tociples for the 
validation and use of personnel selection procedures . 
Daytbh> OH: The InHustrial-Organizationai Psychologist. 



American Psychological Association, piy^^ Industrial- 
Organizational Psycholo^ (1980). Principles for the 
va lidat i on and use of personnel selection procedures > 
(2nd edition) Broccoli, GA: Author • 

Baird, L. L. (1983). Predicting predictability: "aie influence o f 

student and institutional characteristics on the prediction of 
grades . College Board Report No . 83-5. New Yorkl College 
Entrance Examination Board. 

Boldt> R. ]1985) Generalization of SAT val idity across 

colleges . _ College Board Report No. 86-3. New York: College 
Entrance Examination Board. 

Braun, H. i . &_JoneSr::b*_ ~ Hj, (1985) . Use of eit^irical Bayes methods 
in the study of ±he v a lid ity of academi c predictors of 
graduate scnooT performance . GRE Board Professional Report 
GREB No. 79-1 3P. Princeton, NJ: Educational Testing 
Service. 



Equal Enployment Opportunity Commission, Civil Service Cprt^ 

Department of Labor, & Department of Justice (1978) . Adoption 
by four agencies of Uniform Guidelines on Enpioyee Selection 
Procedures * Federal Register , 43 , 38290-12008. 

Graduate Record Exsuninatipns Board (1985).. I^ual fbr_Participatibn 
in ihe Graduate Record Examinations vaiigity Study Service . 
Princeton, NJ: Educational Testing Service. 

Guion, R._ M._ (1976j. Recruiting, selection and job placement. In M. 

D. Dunnette (ed. j>_ Hahdbbbk _of industrial and orgahizationai 
psychology (pp. 562-575) . cHTcagol RafS McNaUy^^ 



Guiiiksen, H. (1950) . ^ory of menta l tests . New Ybrk: Wiley. 

Linn, R. _& Hastings,. C. N. (1984). A meta analysis of the 

validity qf_predictbrs of performance in law school. 
Journal of Educational Measurement, 21, 245-259. 



-12- 



Linn, R. L.> Harnish, D. _L.^ & Dunbar, S- B. (1981), Validity 

geheralizatibh arid situational specificity: Ah analysiis of 
tJie prediction of first-year grades in law school, implied 
Psychological Measureitient y S, 281-289. 

Lord^ F. M. , & Novick, M. R. (1968). Statistical theories of 
mental test scores. Reading, Sftl Addi son-Wesley. 

Pearlman^ K. , Schmidt, F. L,^ & Hunter^ j. E. (1980). Validity 
generalizatipn results for tests used to predict job 
proficiency and training success in clerical occupations. 
Journal of j^plied Psychology , 65 , 373-406. 

Schmidt, F. L./ Gast-Rosenberg, I., & Hunter, j. E. (1986). 

Validity generalization results _ for confute r programmers. 
Journal of Ag^lied Psychology, 65 , 643-661. 

Schmidt, F. E. , & Hunter^ E. (1977|. Development of a general 
solution to the problem of validity generalization. Journal 
of implied Psychology , 62 , 529-540, 

Schmidt, F. L.f Hunter, J. E., Pearlman^ K.j, & Shane, G. S. (1979). 
Further tests of the Schmidt-Hm Bayesian validity 
generalization procedure. Personne l Psych ology , 32, 
257-281. 



Thorhdike>_ R^_ L . ( 1982 ) . /^plied psychometrics . Boston : Houghton 
Mifflin. 

Wallmark, M. (1982a) .Test analysis, aptit^^ ETS 
Statistical Report 82-76. Princeton, NJ: Educational 
Testing Service. 

W&iimark, M. (1982b)^ Test analysis, aptitude test form 3DGR3. ETS 
Statistical Report 82-64. Princeton, NJ: Educational 
Testing Service. 

Wallmark, M. (1982c). Test analysis, aptitude test form 3pGRl. ETS 
Statistical Report 82-35. Princeton, NJ: Educational 
Testing Service. 



EKLC 



16 



TABLE 1 

MEAN VALIDITIES FOR THE VSS GROUPS, OBSEEWH) AND 
imLlED, BAS^ m ^ TWO GENERALIZATION HYPOTHESES 
APPLIED IN SiE Tifi^E SETS OF GROUPS 

VERBAL eUANT ANALYT 

OBSERVED IN VSS DATA 

.23 .24 .28 

IMPLIED VALIDITIES OBTAINED USING HYPOTHESIS ^M) 
GROUP INDICATED 

VSS GROUPS 

EQUMi ^ibiTIES .23 .24 .28 

EQUAL RATIOS .23 .24 .28 

APPLICANT, POOLS 

E?yjAL VALIDITIES .22 .23 .26 

EQUAL RATIOS .22 .24 .26 

ALL GRE TMCSCS 

E@JAL VALIDITIES .22 .21 .27 

EQUAL RATIOS .23 .23 .28 



17 



TABLE 2 



SlANnARD DSViftTid^S OF VfttlDiTiES FOR "Hm VSS QRCXJPSr 
OBSMVED AND IMPLIED^ BASED 5HE GENES^ilZATiCW 
HYPOTHESES APPLIED IN THE THREE SETS OF GR€^S 

VEamL QUM^T ANALYT 

e^SERVED IN VSS DATA 

.19 .20 .18 

IMPLIEI) VJ^IDITIES CBTMNED USING HYPOTHESIS AND 

GROUP INDICATE^ 

VSS GRKJPS 

E@IAL VALIDITIES 
EQUAL RATIOS 

APPLICANT POOLS 

BQUfiL VALIDITIES 
- E^JAL RATIOS 

ALL GRE TAKESS 

EC»[AL VALIDITIES 
EQUAL RATIOS 



.09 .00 ;dd 

.15 .16 .18 



.25 .i5 .ie 

.16 .19 .18 



.22 .15 ;15 

.17 .19 .19 



18 



-1.5- 



T?*BLE 3 

VSS mCfUF (X5RREEATIC»^S BETWEEN IHE OBSERVE AND 
IMPLIED W^tlDITIES, BftS^ Otl THE TWO GENERALIZATION 
HYPOTHESES APPLIED IN THE THe^ SET^ OF (^^UPS 

HYPOTHESIS VERa^^ QUANT ANALYT 



VSS GR&JFS 

ESSXAL VALIDITIES .00 .03 .00 

EQUAL RATIOS .83 .78 .90 

APPLICANT POOLS 

EQUAL VALIDITIES .15 .00 .17 

EQUMi RATIOS .67 .65 .82 

ALL GRE TAKERS _ 

ESJAL VALIDITIES .15 .08 .23 

EQUAL RATIOS .56 .72 .74 



IS 



EKLC 



-16- 



. _ PERCSa^T OF VSS CmdK> W^IDiTiES ACCCXJNTED FOR 
BY SAMPLING. ERROR AND THE E^JMi VMilDiTY HYPOTHESIS 
APPLIED IN THE THREE SETS OF GROUPS 

GROUP VERBAL OTWQT ANALYT 



VSS GROUPS 58 5t 64 

APPLICANT POOLS 60 51 67 

ALL CaiE TAKEKS 60 52 69 



go 

o 

ERIC 



-±7- 



TfiBLE 5 

MEflNS> SB^iDftRb iSEViATIOIS, FIFIH PERCENTILE V2VLUES 
OF JEST AND TRUE SGORE \^iDlflES, AND FERCHOT- POSITIVE 
VALIDITIES, FOR THE VSS, APPLICANT, AND Gi^ TAKSl GROUPS 

VSS GROUPS APPLICffl^ GROUPS GRE TAKERS 

TEST SCORE TEST SCORE TRUE SCORE TEST SCORE TRUE SCORE 

MEMqS 

VERBAL .23 .27 .29 .32 .34 

QUfiNT .24 .30 ,32 .36 .38 

ANftLYT .28 .31 .33 .39 .41 

ST. E«V. 

VERBAL .19 .21 .22 .21 .22 

mmr .20 .21 .23 .26 .28 

P^NKLTT .18 .20 .21 .21 .22 

5 %-lLE 

^^®AL -.11 -.00 -.09 -.08 -.08 

QUANT -.13 -.66 -.06 -.12 -.12 

ANfiLYT -.01 -.02 -.02 -.06 -.06 



PERCENT POSITIVE 

VSmSL 85 90 90 92 92 

QtlftNT 90 92 92 87 87 

ANftLYf 95 94 94 95 95 



21 



ERIC 



-18- 



APPENDIX A 



USE GF TEST THEORY TO REPRESENT TOE EFFECTS 
OF SELF SELECTION 



We jdaJce a very toTirestrictive assiJinE)tion that self selection 

and external forces Uiats^ person towards a particular 

departinent can be represented by a vector of variables, and it will 
be seen that they do not need to be identified. The variables can be 
represented by the vector variable X . Siappose that, for all 
GRE-takers, the joint distribution of these variables with a vector 
variable T of true scpresfroro the subtests of the GRE General Test 
is a function JfX,Tj^ and assume that errors of measurement are 
independent of X and_ T , ^th distribution D(E). Then, for all 
GRE-takers, the joint distribution of all these variables is 
J(X^T)DIEh and the joint distribution of T and E would just be the 
marginal distribution of T , times D(E). 

Now su^ose self-selection takes place^ By hypothesis, 

and not a very restrictive one, it occurs^ operation of e^^licit 
selection on JK, and could be represented as G(X)J(X,T), where G 
adjusts the frecjuehcies according to however the selection worked. 
Note that selection does hot operate explicitly oh T , since T 
cannot observed. There would be a different. G_ for each 
institution, and toe marginal distribution of T for each institution 
would be tile integral over the space of X of the product of G and 
3 . Since the errors of measurement are independent fcy h^>pthesis, 
the distribution of E would be unaffected. But there would be an 
adjustment in the distribution of T . Hence the test score 
distributions wuW differ only by the_distributibn of T_, and the 
distribution of_ E , conditipnal on T is unaffected and the range 
restriction formulas apply. Thus X operates on t so that even 
though T is not an explicit selector, it can take that role in the 
range restriction formulas because the conditional distributionc of E 
are not affected. In particular > the standard error of measureTOnt is 
unaffected hy the selection and, because the expectation of errors of 
measurement given true score is zero, the covariahce of test scores 
with true scores and the ya^^ scores are equal in both the 

selected and unselected groups, hence the regression constants are the 
same in both groups. 

Note, as was mentioned above, the really helpful fact that 
the variables in X need not be known, nor do the forms of J and G. 



EKLC 



22 



APPEn^IX B 



USE OF A SUPPte^raftRY WHEN DATA ARE 

MISSING FOR AN EXPLICIT SELECT^ 



___When capturing data for tile routine operations of a secure 
testing program, for the inajority of exa^ it is only necessary to 
obtain test scores and application information. Some examinees, 
however, may attend institutions that will supply data to the program 
operator for use in a validity study in which the relationships of 
test scores^ sending institution grades^ and receiving institution 
grades of applicants are studied. If it is desired to estimate 
validitrj?' in an applicant pool, for t!ie explicit 

selectors in bo^.i the applicant and incumbent pools in order to make 
the heeded corrections for the effects of selection. One lacks, 
however, the sending institution statistics in the applicant pool, and 
they imast therefore be estim This can be done if a .suppleKientary 

variable exists that is present in toth the applicant ahd_ incijanbent 
EX)ois and that is correlated with the missing explicit selector* This 
supplementary variable takes the role of a variable stibject_ to the 
effects of selection. Because it is observed in both the applicant 
and incumbent pools, it can be used to estimate the missing 
statistics. 

In the present case, tie sendi^ institutions are 
undergraduate schools, the receiving institutions are graduate 
departments^ the incumbents. are the graduate students \rtiose data are 
used, and the applicant pool consists of those v*td apply to the 
receiving institutions. The explicit selectors that act oh the 
applicant pool to create tt^ pool are those of the GRE 

General Test and undergraduate school grades. The supplementary 
variable can be a self-reported analog to the undergraduate school 
grade since. a test program can easily collect examinee-reported 
biographical informatibh on the registration form. 

The range restriction assiamptibns are that the coefficients 

of regression of variables stibject to selection oh the explicit 
selectors are undisturbed ^ the selection process, as aire the errors 
of prediction, of the variables subject to selection b^ Uie explicit 
selectors. Therefore, the following normal equations for estimating 
regression coefficients in the applicant pool are satisfied 1^ 
regression coefficients calculated in the incumbent pool. 



23 



^qs = ^vq * *=qq ^q ^qa ^a ^cffi) (2) 

^as-^Cav^-^C^qSq^C^a^a^^^ap^p (3) 

llie_ symbols v, q, a, p,^ and s stand for verbal, guantltative, 
analytic^ actual undergraduate school grade, and self-reported 
undergraduate school grade, respectively. is the covariahce of u 

and V calculated in the applicant pool and is the coefficient of 



partial regression of s on x calculated in the incuinbent pool^ hence 

known. Further, all covariahces for \^ich both variables are observed 

by the test program in the applicant pool are known. This leaves C , 

vp 

, and C-p as the only unknown quantities in equations (1), (2), 
and (3) resfjectiveiy. To find them use 













^q- 


e 
va 




(4) 




%s 




Br - 

V 




^q" 




B_)/B_^ and 
a p 


(5) 






~ ^va 


B - 

V 


% 


^q- 


^aa 




(6) 



_ From the assxainptich that the errors of prediction are 
xanaffected by the selection process, we obtain 

Css"^ss-^^'M|C^I|-Irc«,||)b (7) 

where llexxjl and i|cxx|| are Qie expiicit_seiector variance- 
covarianceroatrices in the applicant and incumbent pocis, 
respectively, and _b is a column vector of partial regression 
coefficients, the . With the con^JUtatibns of equations (4) 

throu^ (6) completed, all U^^ for equation (7) are 

available except the variance of the sending institution grade in the 

applicant pool, C„ . If we define j|C *|| as being the same as 

ll^xxll a^zero in the position^f G„ , v^ich is third column 

and third row, then (5) becomes 

^ss=^.s-^^'H|e^*IIH|c^ll)b+C^^^^^ (8) 
All quantities in ^8) are known except G^^ , for which the solution is 

With the calculation of C„„ in equation (9), all the entries in 
I l^xxll available. PP 



APPENDIX C 



GBQSlMilSZNG THE ASSUMPTKM^ THAT THE 
VALIDITIES ARE FW^Q^TC^m, ACTtOSS INSTIIUTIC^^S 



Ao-'ording to the h^oth that the ratios of the 

validities of the subtests of the GRE General Test are the r,am across 
institutions, the validity, V^^ , of a subtest, t , for institution 

i , is the product of , a constant associated with the 

subtest, and G^ , a constant associated with the institution, ^en 

^it " ^t ^i 

Neglecting the error, 

vrfiere the dot indicates averaging over the missing variable, i in 
this case. Then, using (1) and (2) 

for any of the subtests. Therefore, 

iXien, frcim ecjuation (2), 

V.^K. ^F^G. (5) 

Thus, equation (5) gives the formula for estimating the validity of 
the test xander the l^^thesis that the ratios of the validities of the 
subtests are the same for all institutions* Since the average test 
validities are both multiplied by the sante value, , their ratio 

is the sane for all groups, and their level varies with variation in 
the magnitude of . 



25 



-22- 



APPENDIX D 

calculating; vziiiDif ies in tis: restricted group 

. In t±e present stu<^^ once the generalization has been 

madey we want to reverse the range restriction calculations from, the 
population of all GRE takers to an applicant pool, or from an 
a|:^)licant pool to a VSS group. In this situation^ the validity in the 
tinselected group is knowru but hot the validi^ in the restricted 
group, one would wish to have a restricted criterion variance that is 
consistent wLth the restricted validity, if we regard as our second 
set of unknovms toe ratio of the criterion variances^ there is ehou^ 
inforTnatibn to solve the problem. This can be seen as follows. The 
assumption that selection does not affect the regression function can 
be written as follows: 

iic^ir'ii^^ii - iiscxir'iivii -b (1) 

ll^uvll ll^uv'l cdvariance itiatrices for variablet; u 

and V for the_ unrestricted and restricted populations, respectively. 
The X variables are the explicit selectors, y is subject to 
selection and is a single variable in this project. Ihereforejthe 
covariances involving both x arid y are arranged in a column 
vector, as is b. Equation (1) can be rewritten as 

ll^^llyllScxir^ll^xN ll^xyMsy . (2) 
^^^^ l|MN-||C^ir^l|sJ| llR^II , 

Sy and Sy are the standard deviations of y for the unrestricted 

and restricted populations, respectively; lls^jl arid j|s^|| are 

diagonal matrices of standard deviations of the explicit selectors for 
the restricted and unrestricted populations, respectively; and | |R j j 

N^xyll ^® correlations between explicit selectors and y for 

the unselected and selected populations, arra^^ as column vectors. 
The assunptiofL that selection does not affect the errors of prediction 
of y by X can be written as follows: 



and hence as 



^/-Sy2 + b'{||e^||-||c^||)b , 



Sy^ = s2 + ||K||'(||C^||-||C^||)||M|| S^2 



ERIC 



26 



-23- 



Therefore> 

(Sy/Sy) ^ 1/(1-1 |M||'( I |c^| 1^1 Ic^llmr^ = F . (3) 

Then using (2), and the definitions of j|Mj| and F , 

lls^ir^llc^ll l|M|| FHI^xyll • 

All the inforniatibh needed to carry but the calculation indicated on 
the left _hand side of equation (4) if known after the .generalized 
unrestricted validities are Ihe deyelppment of this 

confutation used the range restriction assunptions to arrive at 
restricted correlations without using standard deviations of y , in 
order to obtain estimates consistent with the assturptiohs. 



